Model Selection

ViT-GPT2 Architecture

# ViT-GPT2 Architecture

Vit Gpt2 Image Captioning

This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.

Rgb Language Cap

This is a vision-language model trained on the COCO dataset, capable of generating descriptive texts that include spatial relationships between image entities.

Transformers English

Rgb Language Cap

This is a spatially-aware vision-language model capable of recognizing spatial relationships between objects in images and generating descriptive text.

Transformers English

Vit Gpt2 Verifycode Caption

A ViT-GPT2 architecture captcha recognition model fine-tuned on a dataset of 60,000 images, capable of accurately identifying text in captcha images.

Image Caption Generator

A vision-language model trained on the Flickr8k dataset, capable of generating natural language descriptions for input images

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase